低精度算术对神经网络的训练产生了变革性的影响,从而减少了计算,记忆和能量需求。然而,尽管有希望,低精确的算术对高斯流程(GPS)的关注很少,这主要是因为GPS需要在低精确度中不稳定的复杂线性代数例程。我们研究以一半精度训练GP时可能发生的不同故障模式。为了避免这些故障模式,我们提出了一种多方面的方法,该方法涉及具有重新构造,混合精度和预处理的共轭梯度。我们的方法大大提高了低精度在各种设置中的偶联梯度的数值稳定性和实践性能,从而使GPS能够在单个GPU上以10美元的$ 10 $ 10 $ 10 $ 10 $ 10的数据点进行培训,而没有任何稀疏的近似值。
translated by 谷歌翻译
随机微分方程的系统定义了一系列随机波动率模型。尽管这些模型在金融和统计气候学等领域中取得了广泛的成功,但它们通常缺乏在历史数据上条件产生真正的后验分布的能力。为了解决这一基本限制,我们展示了如何将一类随机波动率模型重新塑造为具有专门协方差函数的层次高斯工艺(GP)模型。该GP模型保留了随机波动率模型的电感偏差,同时提供了GP推断给出的后验预测分布。在此框架内,我们从研究良好的域中汲取灵感,以引入新的型号,即Volt和Magpie,这些模型在库存和风速预测中的表现明显超过了基线,并且自然扩展到多任务设置。
translated by 谷歌翻译
贝叶斯优化(Bayesopt)是查询有效连续优化的黄金标准。然而,决策变量的离散,高维质阻碍了其对药物设计的采用。我们开发了一种新方法(LAMBO),该方法通过判别性多任务高斯流程主管共同训练Denoising AutoCododer,从而使基于梯度的多目标采集功能优化了自动装编码器的潜在空间。这些采集功能使Lambo能够在多个设计回合上平衡探索探索折衷方案,并通过在Pareto边境上的许多不同地点优化序列来平衡客观权衡。我们在两个小分子设计任务上评估了兰博,并引入了优化\ emph {在硅}和\ emph {Inter {In Betro}特性的新任务。在我们的实验中,兰博的表现优于遗传优化者,并且不需要大量的预处理,表明贝叶诺斯对生物序列设计是实用且有效的。
translated by 谷歌翻译
虽然最近的共轭梯度方法和LanczoS分解的工作已经实现了可扩展的高斯工艺推论,但在几种实现中,这些迭代方法似乎在学习内核超参数中的数值不稳定性以及较差的测试可能性方面似乎奋斗。通过调查CG公差,预处理等级和Lanczos分解等级,我们提供了一个特别简单的处方来纠正这些问题:我们建议人们使用小的CG公差($ \ epsilon \ leq 0.01 $)和大的根分解大小($ r \ geq 5000 $)。此外,我们表明L-BFGS-B是迭代GPS的引人注目的优化器,实现了较少的渐变更新的收敛性。
translated by 谷歌翻译
基于物理仿真的优化是科学与工程的共同任务。许多这样的模拟产生了所需目标的图像或张量的输出,其中所需的目标是那些输出的函数,并且在高维参数空间上执行优化。我们开发贝叶斯优化方法利用张量的高斯工艺代理和信任区域贝叶斯优化,以有效地模拟图像输出,并有效地优化这些类型的模拟,包括射频塔配置问题和光学设计问题。
translated by 谷歌翻译
通过更好地了解多层网络的损失表面,我们可以构建更强大和准确的培训程序。最近发现,独立训练的SGD解决方案可以沿近持续训练损失的一维路径连接。在本文中,我们表明存在模式连接的单纯复合物,形成低损耗的多维歧管,连接许多独立培训的型号。灵感来自这一发现,我们展示了如何有效地建立快速合奏的单纯性复杂,表现优于准确性,校准和对数据集移位的鲁棒性的独立培训的深度集合。值得注意的是,我们的方法只需要几个训练时期来发现低损失单纯乳,从预先接受训练的解决方案开始。代码可在https://github.com/g-benton/loss-surface-simplexes中获得。
translated by 谷歌翻译
We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including MC dropout, KFAC Laplace, SGLD, and temperature scaling.
translated by 谷歌翻译
Our goal is to reconstruct tomographic images with few measurements and a low signal-to-noise ratio. In clinical imaging, this helps to improve patient comfort and reduce radiation exposure. As quantum computing advances, we propose to use an adiabatic quantum computer and associated hybrid methods to solve the reconstruction problem. Tomographic reconstruction is an ill-posed inverse problem. We test our reconstruction technique for image size, noise content, and underdetermination of the measured projection data. We then present the reconstructed binary and integer-valued images of up to 32 by 32 pixels. The demonstrated method competes with traditional reconstruction algorithms and is superior in terms of robustness to noise and reconstructions from few projections. We postulate that hybrid quantum computing will soon reach maturity for real applications in tomographic reconstruction. Finally, we point out the current limitations regarding the problem size and interpretability of the algorithm.
translated by 谷歌翻译
Generalizability of time series forecasting models depends on the quality of model selection. Temporal cross validation (TCV) is a standard technique to perform model selection in forecasting tasks. TCV sequentially partitions the training time series into train and validation windows, and performs hyperparameter optmization (HPO) of the forecast model to select the model with the best validation performance. Model selection with TCV often leads to poor test performance when the test data distribution differs from that of the validation data. We propose a novel model selection method, H-Pro that exploits the data hierarchy often associated with a time series dataset. Generally, the aggregated data at the higher levels of the hierarchy show better predictability and more consistency compared to the bottom-level data which is more sparse and (sometimes) intermittent. H-Pro performs the HPO of the lowest-level student model based on the test proxy forecasts obtained from a set of teacher models at higher levels in the hierarchy. The consistency of the teachers' proxy forecasts help select better student models at the lowest-level. We perform extensive empirical studies on multiple datasets to validate the efficacy of the proposed method. H-Pro along with off-the-shelf forecasting models outperform existing state-of-the-art forecasting methods including the winning models of the M5 point-forecasting competition.
translated by 谷歌翻译
深度神经网络(DNN)模型越来越多地使用新的复制测试数据集进行评估,这些数据集经过精心创建,类似于较旧的和流行的基准数据集。但是,与期望相反,DNN分类模型在这些复制测试数据集上的准确性上表现出显着,一致且在很大程度上无法解释的降解。虽然流行的评估方法是通过利用各自测试数据集中可用的所有数据点来评估模型的准确性,但我们认为这样做会阻碍我们充分捕获DNN模型的行为以及对其准确性的现实期望。因此,我们提出了一种原则性评估协议,该协议适用于在多个测试数据集上对DNN模型的准确性进行比较研究,利用可以使用不同标准(包括与不确定性相关信息)选择的数据点子集进行的子集。通过使用此新评估协议,我们确定了(1)CIFAR-10和Imagenet数据集上$ 564 $ DNN型号的准确性,以及(2)其复制数据集。我们的实验结果表明,已观察到的基准数据集及其复制之间观察到的准确性降解始终较低(即模型在复制测试数据集上的性能更好),而不是在已发表的作品中报告的准确性退化,并依靠这些已发表的作品依赖于常规评估。不利用不确定性相关信息的方法。
translated by 谷歌翻译